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Q_!' Abstract 

A public database system archiving a direct numerical simulation (DNS) data set of 
isotropic, forced turbulence is described in this paper. The data set consists of the DNS 
output on 1024 3 spatial points and 1024 time-samples spanning about one large-scale turn- 
over timescale. This complete 1024 4 space-time history of turbulence is accessible to users 
remotely through an interface that is based on the Web-services model. Users may write and 
execute analysis programs on their host computers, while the programs make subroutine-like 
calls that request desired parts of the data over the network. The users are thus able to 
perform numerical experiments by accessing the 27 Terabytes of DNS data using regular 
platforms such as laptops. The architecture of the database is explained, as are some of 
^1 , the locally defined functions, such as differentiation and interpolation. Test calculations are 

performed to illustrate the usage of the system and to verify the accuracy of the methods. 
The database is then used to analyse a dynamical model for small-scale intermittency in tur- 
bulence. Specifically, the dynamical effects of pressure and viscous terms on the Lagrangian 
evolution of velocity increments are evaluated using conditional averages calculated from the 
DNS data in the database. It is shown that these effects differ considerably among themselves 
and thus require different modeling strategies in Lagrangian models of velocity increments 
and intermittency. 
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With the advent of petascale conrputing, it will become possible to simulate turbulent flows 
that cover over 4 orders of magnitude of spatial scales in each direction. For example, a simu- 
lation of turbulent flow that outputs 4 field variables on O[(10 4 ) 3 ] spatial points, when stored 
on 10 4 time-frames leads to datasets of about 160 petabytes. For more complex problems, such 
as turbulent advection of passive scalars [U [2] and magnetohydrodynamic turbulence [3], the 
S^ amount of data will increase even further. Such large datasets, however, create serious new chal- 

lenges when attempting to translate the massive amounts of data into meaningful knowledge 
and discovery about turbulence. The natural answer to this challenge is to seek the application 
of "database technology" in Computational Fluid Dynamics (CFD) and turbulence research. 
Database technology more broadly has focused on seeking automated ways to identify patterns 
and reduced-order descriptions, develop machine learning, perform data mining, and so on, in or- 
der to reduce the amount of data to be processed and transmitted. One example of such projects 
is the Sloan Digital Sky Survey (SDSS) project (for a list of similar projects see the Appendix E 
of |4J). The project aims at producing a comprehensive astronomical data archive, and providing 
astronomers the ability to explore the multi-terabyte data interactively and remotely through 
the Internet [5]. The data archive has led to over one thousand research articles published by 
researchers from all over the world. While there are several fields in which this approach has 



been embraced, the database approach has, to date, not been widely embraced by the CFD re- 
search community. For instance, in the area of Direct Numerical Simulations (DNS) of turbulent 
flows, existing efforts to share large datasets PdlH] either (1) assume that the entire or most 
of the data are downloaded and the user must provide the computational resources for analysis 
of the large data sets, or (2) the user must establish accounts, privileges, and be granted access 
to large fractions of memory at a high-perforance computing (HPC) facility, typically where the 
data sets were generated and stored, in order to run analysis programs there. While valuable 
contributions, such approaches do not fully leverage existing database technologies, which should 
support scientists with flexible and user-friendly tools without requiring extensive initial efforts 
to set up specialized access to HPC facilities. Also, in order to accommodate the majority of 
researchers with more limited computational infrastructure, database approaches should allow 
scientists access to relevant parts of the data without having to download significant fractions of 
the data to their computers. The prevailing environment without such effective database sup- 
port has hampered wide accessibility of large turbulence and CFD datasets. This problem is not 
expected to improve substantially even with faster network technologies. While data transfer 
is getting faster, the size of the "top-ranked simulations" is growing faster still. Thus, without 
developing new approaches, the top-ranked turbulence simulations may remain accessible only 
to a small subset of the research community. 

As a step in the direction of applying database techniques to turbulence research, in this 
paper we describe the construction and application of a 27 Terabyte database cluster containing 
the 1024 4 space-time history of forced isotropic turbulence obtained using DNS. The database 
archives a time series of velocity and pressure fields of isotropic stationary turbulence. The data 
are obtained by DNS with 1024 3 grid points in a periodic box. 1024 time-samples spanning 
about one large-scale turn-over time scale are stored. A user interface is layered on top of the 
database, so that the users can access the data via the Internet. A function set implemented in 
the database allows flexible in situ data processing for basic operations such as differentiation 
and interpolation. While some of the technical details about the design of the database have 
been described in [9j , in this paper we give a further description, with emphasis on how it may 
be used for turbulence research. We also provide a description of the functions available to the 
user. Then several test cases are presented to illustrate the usage of the database. The tool 
is then applied to the analysis of a Lagrangian model of turbulence small-scale intermittency 
proposed recently [101 [TJ] . 

Turbulence intermittency refers to the infrequent but strong fluctuations in the parameters 
characterizing small-scale turbulent motions, such as velocity increments, velocity gradients, and 
dissipation rates. Due to small-scale intermittency, it is observed in experiments and simulations 
that the probability density functions (PDF) of the small scale quantities develop exponential 
or stretched exponential tails. Also, the scaling exponents of the moments of the velocity 
increments deviate significantly from the prediction of the classical Kolmogorov theory |12[ 
[TBI 114] . Thus the quantitative prediction of small-scale intermittency behavior has been one 
of the major fundamental challenges in turbulence research. Substantial progresses have been 
made in the phenomenological modeling and related 'toy models' of small scale intermittency 
[IH EH El [El [THl [20] . However, these models generally made little direct connection with the 
Navier-Stokes equations. A recent development has been to couple the dynamics of velocity 
gradients with the Lagrangian evolution of the small scale geometrical features of turbulence. 
Geometrical information has been invoked in the tetrad model [21] , in the material deformation 
tensor evolution|22, 23J, and material line elements |10[ lllj . In Refs. |10[ 111] , the velocity 
increments defined over a line segment advected by a turbulent velocity field have been studied. 



A simple dynamical system describing the Lagrangian evolution of the velocity increments was 
derived. In this so-called "advected delta-vee" system, the nonlinear self-interaction terms in the 
Navier-Stokes equations are reduced to some simple closed terms. Neglecting the unclosed terms, 
the truncated system was shown to reproduce several important trends concerning intermittency, 
such as the skewness and elongated tails in the PDFs of the velocity increments. 

However, since several important pressure and viscous terms were neglected in the "advected 
delta-vee" system, the system does not approach a stationary state and instead diverges towards 
unphysical behavior at large times. As concluded in |llj . quantitative prediction of intermittency 
requires the modeling of the unclosed terms. In this paper, the terms are analyzed using the 
DNS data in the turbulence database. Following the statistical approach of [21], we consider 
the effects of the pressure and viscous terms based on the probability fluxes they produce in the 
equation for the joint probability distribution function (PDF) of the velocity increments. 

The paper is organized as follows: In ^21 some design principles of the database are described, 
and the dataset is also documented. Several standard calculations are conducted in $3j which 
include the one-dimensional energy spectra of the velocity fields [14] . the PDFs of elements of 
the velocity gradients Aij = dui/dxj, and the joint PDF of the two independent invariants of 
the velocity gradient Q = —AijAji/2 and R = —AijAjkAki/3 [25]. The statistical descriptions 
are obtained by running programs on local desktop computers, while obtaining the data from 
the database remotely. The analysis of the advected delta-vee system is presented in 21 The 
conclusions are given in $5] 

2 Architecture of the database cluster 
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Figure 1: The architecture of the turbulence database cluster. 



From the point of view of a user, it is desirable that a database cluster should, among 
others: (1) be available to the public by allowing remote access through the network; (2) provide 
flexible and comprehensive data processing functionalities inside the database so as to reduce 
data download; (3) be compatible with different platforms; (4) be compatible with the programs 
and tools that have been developed and used by the researchers in the past; (5) be efficient. 



With these requirements in mind, a cluster of database servers has been designed according 
to the diagram shown in Fig. [TJ A time series of 1024 3 DNS data of isotropic, steady state 
turbulence is stored in the database cluster. A total of 1024 time steps, or 27 terabytes of data, 
have been loaded. The data are partitioned spatially and placed on different database nodes in 
the cluster. Most of the nodes have two dual core AMD Opteron processors with 4GB memory, 
while some have two Intel Xeon quad-core processors. The operating system is 64-bit Microsoft 
Windows 2003 Server. The dataset is managed by Microsoft SQL Sever 2005 (64 bit). For more 
details see [9]. 

The users connect to the database via the Internet (at http://turbulence.pha.jhu.edu) and 
access the data through a data access server. The communication between the users and the 
data access server adopts the Web service model. The data access server is also the head-node 
of the database cluster. It communicates with other database servers (nodes) in the cluster 
over high-speed local area network. The data access server is a mediator: it receives requests 
from the users, breaks the request into parts and sends the parts to the individual database 
nodes. The majority of the calculations are done in the Computational Module in each node, 
embodying the "move the program to the data" principle in scientific database design [5] . High 
efficiency is achieved through a high degree of parallelism. The details of the turbulence dataset 
and details of several components of the database cluster are described in the following several 
sub-sections. 

2.1 Turbulence data set 

The data is from a DNS of forced isotropic turbulence on a 1024 3 periodic grid in a [0, 27r] 3 - 
domain, using a pseudo-spectral parallel code. Time integration of the viscous term is done 
analytically using the exact integrating factor. The other terms are integrated using a second- 
order Adams-Bashforth scheme and the nonlinear term is written in vorticity form |26j. The 
simulation is de-aliased using phase-shift and 2y2/3 spherical truncation [27], so that the ef- 
fective maximum wavenumber is about /c max = 1024V2/3 ~ 482. Energy is injected by keeping 
constant the total energy in modes such that their wave-number magnitude is less or equal to 
2. The simulation time-step is At = 0.0002. After the simulation has reached a statistical 
stationary state, 1024 frames of data, which includes the 3 components of the velocity vector 
and the pressure, are generated in physical space and ingested into the database. The data are 
stored at every 10 DNS time-setps, i.e. the samples are stored at time-step 5t = 0.002. The 
duration of the stored data is 1024 x 0.002 = 2.048, i.e. about one large-eddy turnover time since 
T = L/vl ~ 2.02 (L is the integral scale). The parameters of the simulation are given in Table 
12. 1L The total kinetic energy, dissipation rate, and energy spectra are computed by averaging 
in time between t = and t = 2.048. 

Figure [2] shows the radial energy spectrum computed from the simulation and averaged 
between t = and t = 2.048. Figures [3] and H] show time-series of the total turbulent kinetic 
energy and Taylor-scale based Reynolds numbers starting near the simulation initial condition. 
The values corresponding to the data in the database are for t > and are shown in solid line 
portions. 

2.2 User interface 

The Web service model is adopted to implement the communication between the users and the 
database. The Web service technique makes possible the automatic interaction between the 
database and a computational program running on a user's desktop or laptop computer. The 
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Table 1: The parameters of the DNS data in the turbulence database. 

program can fetch data from the database whenever data are needed, while performing further 
data analysis on the local machine between data requests. To realize this functionality, the user 
interface includes two parts: the Web-service methods implemented on the database access server 
and the client-side wrapper interface. Web-service methods are based on the standard protocol 
SOAP (Simple Object Access Protocol) [28J. They can be easily called from some modern 
programming languages such as Java or C#. Unfortunately, other languages need additional 
libraries to use Web-services. For example while recent versions of MATLAB offer integrated 
Web-service support, it is still necessary to provide a wrapper function to perform complex data 
type conversions. For FORTRAN and C codes running in UNIX, Linux or MacOS, we have 
developed and provided FORTRAN and C wrapper interfaces to the gSOAP library [29], which 
establishes the communication between the client code and the Web-service methods. 

A list of Web-service methods are implemented on the data access server [30] . These methods 
provide the entry points to the processing functions implemented on the database servers (the 
"Computational module" in Fig. [H will be explained below). The data access server has 
the knowledge of the data partition among the database servers. When a request is received 
from the user, the data access server find out on which database servers the requested data is 
stored, and then break the request into parts and send each part to the corresponding server. 
Upon receiving the requests, the Computational Modules on the database servers are invoked 
to perform necessary computations and return the requested data to the Web-service methods 
on the data access server (see Fig. [1]). The Web-service methods then communicate the data 
back to user programs. 

2.3 Processing function set 

A set of processing functions are implemented as stored procedures on the database servers, 
referred to as Computational Module in Fig. [TJ Microsoft SQL Server's Common Language 




Figure 2: Radial kinetic energy spectrum, averaged in time between t = and 2.048. 

Runtime (CLR) [31] integration allows these stored procedures programmed in C#. They are 
intended to provide for the users a comprehensive and yet compact set of tools to process the data 
in the database. In turbulence research, the questions being asked and techniques used in data 
analysis are highly specific to the client and often vary from one client to another. Therefore, 
the design of these functions is not straightforward. One has to consider the trade-off between 
generality and efficiency From our experience, we consider the most basic common tasks in 
data analysis are to calculate some basic flow parameters on a set of spatial locations. The basic 
functions supported thus include evaluating the velocity u% and pressure p on arbitrary spatial 
locations, and also their spatial differentiation for first and second derivatives, as well as spatial 
and temporal interpolation. Specifically, the functions allow evaluating the full velocity gradient 
tensor Ay = dui/dxj as well as the pressure gradient dp/dxi. For second derivatives, the 
pressure Hessian d 2 p/dxidxj, Laplacian of velocity V 2 Ui, and full velocity Hessian, d 2 Ui/dx p dx q , 
are supported. Due to the need for spatially localized differentiation stencils, derivatives are 
evaluated using centered finite differencing of 4th, 6th and 8th orders. Spatial interpolation 
is performed using Lagrange polynomial interpolation of various orders (also 4th, 6th and 8th 
order). Temporal interpolation is done using Cubic Hermite Interpolation. Details of these 
functions are provided in Appendix B, C, and D. 



2.4 Data indexing, partition, and workload schedule 

Indexing the spatial data consists of mapping the three dimensional spatial data onto a one 
dimensional array. Maintaining good locality of the spatial data is one of the favorable attributes 
of the mapping. That is, if two points are close to each other in the 3D space, their positions 
in the one dimensional array should also be close. This is clearly an advantage for applications 
accessing coherent regions in the physical space, such as structure identification and filtering. 
In our system, the data is indexed according to the Z-order (also known as Morton-order) [32] . 




Figure 3: Total kinetic energy as function of time. Dashed line is times before ingestion into 
database. Data corresponding to the database is show using solid line between t = and 2.048. 



for it maintains fairly good locality and is simple to implement and calculate. The Z-index of 
each grid point in the computational mesh is calculated from the three-dimensional index of the 
point using bit-interleaving [9]. The concepts of Z-order and Z-index are illustrated in Fig. [5] 
with a two-dimensional grid. For our 1024 3 DNS data, we need 30 bits to index all the points. 
The Z-index of each point has 30 bits in its binary representation. 

The spatial partition and organization of the data is also based on the Z-order of the data. 
To optimize I/O operations, we divide the whole data set into "atoms" of a fixed size (see 
below). The atom is the fundamental unit of I/O operations, i.e., whenever some data is 
needed, a whole atom is read into the memory. The atoms are stored on disk and indexed using 
a standard database B+- tree index. The B+- tree index of an atom is given by the six common 
middle bits of the binary representations of the Z-indices of the data points within the atom 
(see [9] for details). Using these bits as the search key in the B+- tree ensures that the atoms 
that are contiguous in the Z-order are also contiguous in the index. Given the good locality of 
the Z-order, this data structure thus supports fast access to contiguous region in the physical 
space. Similarly, the allocation of the atoms on different data servers is also determined by the 
Z-indices of the data points. 

The size of the atoms is chosen based on two considerations. We have found that using 
an atom of size 64 3 data points as the fundamental I/O unit is the most efficient because it 
balances the memory and disk performance [9] in our application. However, as we mentioned 
before, a frequent operation required by the processing functions is interpolation. For 8th order 
interpolation, a blob of 8 3 data points need to be accessed to obtain the interpolated value at one 
point. Given the high frequency of this operation, it is highly desirable to avoid data transfer 
between different data servers during this operation, which would incur significant overhead. 
This is done by "edge replication" on the atoms. Specically, we divide the data space into 
(1024/64) 3 = 16 3 cubes of size 64 3 . An atom contains a cube in its center, and an edge of length 
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Figure 4: Taylor micro-scale Reynolds number as function of time. Dashed line is times before 
ingestion into database. Data corresponding to the database is show using solid line between 
t = and 2.048. 

4 on each side of the cube. Each atom thus has 72 3 data points, with edges replicated from the 
neighoring atoms. This way, it is guaranteed that each interpolation operation can be finished 
with no more than one atom being read [9]. 

The Z-order indexing scheme is also used to schedule workload. As introduced above, the 
generic job request would be finding the values of some parameter on an array of spatial locations. 
To calculate the parameters at a point, the data in a small region around the point are accessed 
and need to be read from the hard drive. To minimize the amount of data reading, the spatial 
locations, and thus the data queries, are grouped according to the Z-order. Combined with data 
caching, where an atom of data is kept in the memory until another atom has to be loaded, this 
scheduling guarantees the same data atom is loaded into memory only once per job request. 



3 Preliminary numerical tests 

While loading the data onto the database cluster a number of point-by-point checks are made. 
In order to illustrate the usage of the database system, and also as a further check against, 
for example, the potential errors occurred during loading the data into the database, three 
basic groups of quantities are calculated using the database. The first group is the longitu- 
dinal and transverse one-dimensional (ID) energy spectra. The longitudinal spectrum is de- 
fined as En(ki) = (ui(kl)ui(ki)) , where Ui(fci) is the one-dimensional Fourier transforma- 
tion of the x component of the velocity along the x direction. The average is taken over the 
(y,z) plane. Similarly, the two transverse spectra are defined as £"22(^1) = (&2 (^1)^2(^1)) an d 
-^33(^1) = (^3(^1)^3(^1))) where U2 and U3 are one-dimensional Fourier transformations of the 
y and z components of the velocity, respectively. The second group of tests is the PDFs of 
the longitudinal and transverse velocity gradients. The longitudinal velocity gradients are the 
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Figure 5: Table of Z-indices of a two-dimensional 8x8 grid, i and j are the indices of the grid 
points along x and y directions, respectively. The values in the table are the Z-indices of the grid 
points calculated using bit-interleaving. The Z-order curve connects the grid points according 
to Z-order, i.e., in the order of increasing Z-indices. The same order is used to index the data 
in the database. 

components of A^ when i = j, while the others are considered transverse ones. The third one 
is the joint PDF of the two tensor invariants of the velocity gradient R = —AijAjj.A^/3 and 
Q = —AijAji/2[25\. The tear-drop shape of the joint PDF of these quantities is a well-known 
feature of turbulence [251 ED EH E3] . These three basic groups are evaluated in two separate 
ways and compared: The first is in the traditional way, which we will call "in-core" analysis. 
The in-core analysis is done on the cluster where the data are generated, and the raw data are 
the Fourier components of the velocity fields before they are transformed and loaded into the 
database. The velocity gradients are calculated in the Fourier space using spectral derivatives. 
The second approach uses the data stored in the database and uses database queries. In partic- 
ular, the velocity gradients are obtained using the 6th order centered finite differencing option 
(see Appendix B). Interpolation of the velocity is done using 6th-order Lagrange polynomial 
option when necessary (see Appendix C). In order to remove possible differences resulting from 
statistical sampling, data on exactly the same spatial locations are used in both methods. 

The ID spectra are calculated based on the velocity along particular lines across the domain. 
These are obtained by invoking the Web-service method GetVelocity. We take 512 grid lines 
along the x direction at 32 x 16 (y, z) locations. The 32 x 16 locations are distributed uniformly 
on the grids along both y and z directions, i.e., 32 locations along y direction and 16 locations 
along z direction. In each database query, the velocity vectors on the 1024 grid points on one 
of the lines are fetched from the database. Therefore there are 512 database queries in total, 
each fetching 1024 points. For the analysis in this paper, only data at a single time, at t = 0, is 
used. The one-dimensional FFT of the velocity data along each line is performed in the program 
running on the local user machine. The energy spectra are obtained by averaging over all the 
lines. The FORTRAN code snippet in Fig. [6j taken from the program running on the local 
user machine, shows how the database is typically used in an application. As is highlighted in 
the figure with bold font, the Web-service method is invoked from the computational program 
through a wrapper function having the same name GetVelocity, as if it is another function 
implemented on the same computer. The wrapper function takes the time, the number of 



! parameters for database queries 

integer, parameter : : NoSInt = 

integer, parameter : : NoTInt = 

character*100 :: dataset = ' isotropicl024coarse ' // CHAR(O) 

character*100 :: authkey = 'a;****************************' // CHAR(O) 

I other parameters 



! get the velocity on npnt points. 

real, dimension ( 3 ,npnt ) :: points, velocity 

integer : : getvelocity 

real : : time 

! Intialize the gSOAP runtime. 
CALL soapinit( ) 

points ( 1 ,:)=(/( 2 . *PI /npnt*ii , ii=0 , npnt-1 ) / ) 

time=0 . 

do ii=0,nline-l 

points ( 2 , : ) = ( 1024/nliney ) *modulo( ii, nliney ) *dx 

points(3, : ) = ( 1024/nlinez ) *int ( ii/nliney ) *dx 

! database query to find the velocities on the points 
rc=getvelocity (authkey , dataset, time, NoSInt, & 
NoTInt, npnt, points, velocity) 

! FFT of ux and calculate the longitudinal energy spectrum 

call rfftw_f77_one(R2Cld, velocity ( 1 ,:) , uxo) 

uxo=uxo/npnt 

Ellkl(l)=Ellkl(l) +uxo ( 1 ) *uxo ( 1 ) 

do jj=2,npnt/2 

Ellkl( j j )=Ellkl( j j) + uxo( j j)*uxo( j j) + uxo(npnt+2-j j )*uxo(npnt+2-j j) 
end do 
Ellkl(nl)=Ellkl(nl)+uxo(nl)*uxo(nl) 

! FFT of uy and uz and transverse energy spectra 



end do 

Ellkl-2.0 SP*Ellkl/nline 



! output 



! Destroy the gSOAP runtime. 
call soapdestroy ( ) 

Figure 6: Snippet of the FORTRAN code running on local user machine. Bold font highlights 
the lines invoking the Web-service method. The authkey has been intentionally marked out. 

points, the order of Lagrange interpolation to be used, and the x, y, and z coordinates of the 
points as input parameters. The three velocity components on the requested points are returned. 
The C wrapper function, shown in Fig. invokes the gSOAP library to invoke the Web-service 
method implemented on the data access server. Notice that the data points we are using are 
all on the grids. Because there is no interpolation error for the values of the velocity on these 
points, this calculation could be compared with the in-core calculation and serves as a low-level 
check for the correctness of the data loading process. The result is shown in Fig. The results 
calculated from the database are equal to those calculated in-core, suggesting that there was no 
error in loading the data. 

The spectra show a narrow inertial range, in which the longitudinal and the rescaled trans- 
verse spectra overlap. The rescaled transverse spectra fall slightly above the longitudinal one 
towards the viscous range. The two transverse spectra are identical to each other except that 
slight difference can be seen at the lowest wavenumber. These features are all consistent with 
other DNS and experimental results (see, e.g., [33J). As a result of dealiasing, the spectra extend 
only to a wavenumber around 482, as noted before. 
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int getVelocity (char *authToken, 
char *dataset, float time, 
enum Spatiallnterpolation spatial, 
enum Temporallnterpolation temporal, 
int count, float datain[][3], float dataout [ ] [ 3 ] ) 



{ 



int re ; 

struct _turbl GetVelocity input; 

struct _turbl GetVelocityResponse output; 

input . authToken = authToken; 

input .dataset = dataset; 

input, time = time; 

input . spatiallnterpolation = SpatialIntToEnum( spatial) ; 

input .temporallnterpolation = TemporalIntToEnum( temporal) ; 

struct turbl Array0fPoint3 pointArray; 

pointArray. sizePoint3 = count; 

pointArray .Point3 = (void *)datain; 
input. points = & pointArray; 

re = soap_call turb2 GetVelocity ( & jhuturbsoap, 

NULL, NULL, & input, Soutput); 
if (re == SOAP_OK) { 

memepy (dataout , output .GetVelocityResult->Vector3 , 

output .GetVelocityResult-> sizeVector3 * sizeof ( float ) * 3); 

} else { 

fprintf (stdout, "»> Error ... \n" ) ; 

soap_print_f ault ( & jhuturbsoap, stdout) ; 

} 

soap_end(& jhuturbsoap) ; 

soap_done(& jhuturbsoap) ; 

return re; 
} 

Figure 7: The C wrapper interface for the FORTRAN code. It invokes the gSOAP libraries to 
invoke the Web-service method. 

To illustrate the effects of Lagrange polynomial interpolation, another case with 512 lines and 
4096 points on each line is also calculated. The lines are chosen to be the same as in the previous 
case. The 4096 points are uniformly distributed on each line, and one in every four points falls 
on the grids. The 6th order Lagrange interpolation scheme is used. As is shown in Fig. [9l the 
range of the spectra extends beyond the maximal resolved wavenumber in the simulation (about 
482) to a value of around 2000. The oscillating lobes observed between wavenumbers 482 and 
2000 are the spectral signature of the Lagrange interpolant polynomials. As a consequence of 
the smoothness of the interpolants, very little energy is contained in high wavenumber lobes, as 
one can see by comparing Fig. [8] and [9j 

The PDFs of the velocity gradients and the joint PDF of R and Q are calculated in a 
similar way, except that the Web-service method invoked is GetVelocityGradient, instead of 
GetVelocity. The database queries return values for each element of the velocity gradient 
tensor, at every point requested. The velocity gradient tensor elements are calculated inside the 
database. In this case data on 256 3 , or about 17 million, spatial locations are requested. The 
points are again on the grids, and are equally spaced along each direction. 

The results are plotted in Figs. [10]and[TTJ Also plotted are the results calculated in-core, in 
which the velocity derivatives are calculated from the Fourier components of the velocity field, 
and then transformed into physical space. The figure shows that there is very good agreement 
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Figure 8: One-dimensional energy spectra calculated through database queries (lines) and cal- 
culated in-core (symbols). Solid line and squares: Eu(ki), dashed and diamonds: (3/4).E22(&i)j 
dash-dotted and gradients: {?>/4)E^{ki). Thin dashed line has slope —5/3. 

between the results. The minor differences are due to slight differences in the 6th-order finite 
difference and the spectral evaluations of the gradients. The results are also in good agreement 
with the results reported in the literature (see, for example, [33] and [24j). 

The time taken to calculate the energy spectra is about 16 minutes for 512 x 1024 points, 
and 26 minutes for 512 x 4096 points. These numbers are quite encouraging for an initial 
implementation that has much room for additional optimizations. 

To provide a more complete characterization of the presently achievable access speeds as 
function of the size of the request, additional experiments are carried out. Calls to the database 
are issued over the Internet from a FORTRAN program running on a desktop computer at JHU. 
Access times are measured as function of the number of points, N„, at which data are requested. 
Three spatial arrangements of the points are considered: (a) N p points arranged on a cubic lat- 

1/3 

tice, with ~ N p points on each side; (b) all N p points along a single line, with points separated 
by 2w/N p ; and (c) N p positions at random positions over the entire domain (uniform probability 
density for each coordinate between and 2tt). Figures [L2| 1131 and 1141 show the resulting access 
times in seconds, as function of N p for the three cases, respectively. For each spatial arrange- 
ment of the points, four types of data are requested. (1) three components of velocity (using 
GetVelocity), without spatial nor temporal interpolation; (2) nine velocity gradient tensor ele- 
ments (using GetVelocityGradient), evaluated using 8th-order centered finite differencing but 
without spatial nor temporal interpolation; (3) pressure Hessian (using GetPressureHessian), 
evaluated using 4th-order centered finite differencing and 4th-order Lagrange Polynomial inter- 
polation in space, by without temporal interpolation; and finally (4) velocity vector and pressure 
(using GetVelocityAndPressure), with 6th-order Lagrange Polynomial interpolation in space 
and Piecewise Cubic Hermite Interpolation Polynomial method for time. The corresponding 
access times as function of N p are indicated with different symbols in Figs. [T21141 

In case (a), not surprisingly, the function evaluation using both spatial and temporal inter- 
polation is the most time-consuming while the one without any interpolation is fastest. In this 
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Figure 9: One-dimensional energy spectra calculated through database queries using 4096 data 
points on each line with spatial interpolation. Line patterns same as in Fig. [8) 

case, when the number of points requested increases, the access time increases as ~ N®' 9 , i.e. 
almost linearly. In cases (b) and (c), there is not as much difference as in case (a) among the 
function evaluations without temporal interpolation, but the one with temporal interpolation 
still takes more access time. In the latter two cases, the access time increases as ~ N®' 4 for N p 
above ~ 3000. Comparing the three cases, we can observe that when points are arranged as 
cubes, it takes the least access time for moderate size requests. The reason is that much of the 
data then typically falls within only a few of the stored atoms, decreasing the amount of data 
that needs to be read from disks. As can be seen, access times depend upon the details of the 
data arrangement, interpolation type and the size of the request. Of course, it also depends on 
the client's network speed, etc. 

4 Analysis of the advected delta- vee system 

In this section, the database is used to analyze the advected delta-vee system derived in Ref. 
[10] . For the convenience of the readers, the derivation of the system is briefly reviewed first. 
We restrict ourselves to dynamics in three dimensional space. 

The advected delta-vee system decribes the Lagrangian evolution of the longitudinal and 
transverse velocity increments (or more precisely, the longitudinal and transverse velocity gra- 
dient multiplied by a small and fixed displacement). Specifically, for an incompressible velocity 
field Uj(x, t) filtered at scale A and velocity gradient Aij = dui/dxj, consider a displacement 
vector r(t), and the unit vector in its direction f = r/r. The longitudinal and the magnitude of 
the transverse components of the 'velocity increment' vector along this direction across a fixed 
distance £ < A are defined as 



Su = £ A T 



I An 



Sv 



ik r k n, ov = £ \Pij(r) A jk r k \ , 

where A rr = An-vi-Vi is the velocity gradient along the direction f, and Pij(r) = Si 
projection operator. For a detailed sketch illustrating these definitions, see Fig. 1 of Ref. [TO 



(1) 



fifj is the 
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Figure 10: PDF of longitudinal (solid line and diamonds) and transverse (dashed line and 
squares) velocity gradients, normalized by rms values. Lines are calculated from database queries 
and symbols are from in-core calculations. The dotted line is Gaussian distribution. 



Using the equations for the line element r^ and the filtered velocity gradient Aij, one can 



derive from the definitions of 5u and 5v the following equations (for details see jlUl lllj ): 

dSu 1 



dt 
d5v 

~dT 



-5u z 



-1 + Sv 2 r 1 + Q~l + Y, 



-2 5u 5v r 1 + Z, 



(2) 
(3) 



where d/dt indicates material derivative. Also, Y = IHijfifj and Z = tHijfjei, in which e is 
a unit vector in the direction of the transverse velocity-increment component (perpendicular to 
r) and Ha - -(%&- ^ijd 2 kk p) - (d 2 k T ik - \5ijd 2 k T lk ) + ud 2 k Aij + (d s Ji - lS ij dJ i 



anisotropic part of the gradients of pressure, sub-grid scale, viscous, and external forces, with 



Hij is the 



T, 



'./ 



UiUj 



being the sub-grid scale (SGS) stress. Q is defined as Q 



ij jil 



25u 2 /3£ 2 (see pi] for details). 

Eqs. [2] and [3] are called the "advected delta- vee system" . In the system, the nonlinear self- 
interaction terms are closed, while Q~ , Y and Z are not. As shown in [11], the truncated system 
retaining only the nonlinear self-interaction terms is already able to predict a number of well- 
known, important intermittency trends. Nevertheless, the unclosed terms have to be included 
through physically realistic models in order for the system to predict quantitative features of 
intermittency and to reach stationary statistical states. As the first step in constructing such 
models, here we will now analyze the unclosed terms using the DNS data in the database. Since 
filtering operations are not yet supported by the database, the analysis is restricted to unfiltered 
DNS data and the results pertain to velocity increments across distances on the order of the 
Kolmogorov scale. Therefore the term corresponding to SGS force in Hij will be absent, and 
the filtered quantities will recover their unfiltered values. Hence we will drop in the following 
the overbar in the symbols. For example, Aij will be replaced by Aij and so on. 

The analysis is based on the evolution equation for the joint PDF of 6u and Sv, which has 
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Figure 11: The contour plot of the joint PDF of R and Q. Dashed lines: in-core calculations. 
Grey-scaled contours: database queries, in which R and Q are calculated on the local machine 
with the velocity gradient data fetched from the database. 



been derived in appendix [ 

dP d 

dt d5u 



(5v 2 



l -5u 2 )P 



+ A-A- 2Su5vp ] + A;[<Q~\ Su MF} 



d5v 



d5u 



+ ^-[(Y\ 6u, Sv) P] + — [( Z\ 5u, 5v) P] + 3(8u - {5u))P = 0. 
oou odv 



(4) 



In Eq. [U P = P(5u, 5v; t) is the joint PDF of 5u and 5v at time t. The equation describes 
the balance produced by three different types of terms: the unsteady term dP/dt, the measure 
correction term S = 3(5u — (5u))P, and the probability flux terms. For stationary turbulence, 
the unsteady term is zero. To simplify later analysis, we then write V • W + S = 0, with 
W = W sc + Wq + W p + W v + Wf. The following symbols denote the probability fluxes caused 
by different physical mechanisms: 



W sc = [5v 2 - 5u 2 /3, -25u5v]P, 
W p = [(Y p \5u,5v),(Z p \5u,5v)]P, 
W f = [{Y f \5u,5v),(Z f \5u,5v)]P, 



W Q = [(Q-\5u,5v),0]P, 

W v = [{Y v \Su,5v),{Z v \5u,5v)]P, 



(5) 



in which Y and Z have each been separated into three parts: 



5p = -^i4P + ( 1 /3)^ fc P, Z p 



vfifAA 



Y, 



Y f = fifjdjf, 



a/3R/ fe , 



Z v 

z f 



-eifjdfjp, 

ueirjd kk Aij, 



(6) 
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Figure 12: Access times (in seconds) measured as function of request size (number of points N p ), 

1/3 

when the N p points are arranged on a cubic lattice of size ~ N p ' . Different symbols denote 
different request types, including GetVelocity, Get Velocity Gradient, GetPressureHessian, and 
GetVelocityAndPressure. Different combinations of order of interpolation and differentiation 
are also tested as indicated in the figure. 

These terms represent, respectively, the effects of pressure, viscous diffusion, and external force. 
Note that in the Z's the contributions from the isotropic parts of the tensors are zero because 
r is orthogonal to e. The same is true in Y v due to incompressibility. 

The joint pdf P(5u, 5v) and the flux terms in the above joint PDF equation are calculated 
using the turbulence database described in previous sections, except for the flux produced by 
the forcing term, which is not included in the analysis. The forcing term can not, at this stage, 
be obtained directly from the database. But since the forcing only occurs at large scales its 
direct effect on the terms related to the small-scale dynamics of velocity gradient are negligible, 
and its omission does not alter the conclusions to be drawn. Also, when calculating the viscous 
diffusion term W v , X7 2 Aij is needed, which entails third order differentiation of the velocity 
field. This is a rather specific operation and, for now, was not considered generic enough to 
be included in the processing function set of the database. Therefore, in this analysis V 2 Aij is 
calculated on the local client-side computer using the data of Ay obtained from database queries 
at various grid points, and then evaluating the 4th order central finite difference approximation 
to the Laplacian V 2 . The (5u,5v) phase-space is binned into 80 x 80 boxes and conditional 
averages are computed by averaging over about half of a million randomly chosen datapoints, 
and 50 random directions of r at t = 0. 

The probability flux generated by the closed self-interaction terms is plotted in Fig. [15] The 
contour plot layered under the vector plot is the joint PDF of 5u and 5v . The joint PDF displays 
the correct skewness towards negative 5u direction. By construction, the stream traces in Fig. [T5l 
have the same shapes as the phase orbits of the truncated system (with Q~ , Y, and Z neglected), 
as plotted in Fig. 5(b) in [11]. As explained in [10], [TT|, due to the negative quadratic term 
— Su 2 /3 in Eq. [2] negative values of 5u are continuously amplified, while positive fluctuations 
tend to be decreased. This mechanism, referred to as the self-amplification of the longitudinal 
velocity increments, is thus responsible for generating the negative skewness in 5u. On the other 
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Figure 13: Access times (in seconds) measured as function of request size (number of points 
N p ), when the N p points are arranged along a single line with a separation between points of 
2n/N p . Different symbols are the same as in Fig. [121 

hand, the right-hand side of Eq. [3] becomes positive when Su is negative. Therefore, a large 
negative value of Su will produce exponential growth in Sv, thus leading to strong fluctuation 
in Sv. This mechanism for the generation of intermittency in the transverse velocity increment 
is called the cross amplification mechanism. The stream traces in Fig. [15] give an instructive 
illustration of the coupling between the two mechanisms. This figure, however, also shows 
that the self-interaction terms alone can not maintain a stationary distribution in P(Su,Sv). 
We expect that the other terms will tend to oppose the excessive growth of the intermittency 
produced by these two terms and enable the system to reach a stationary distribution. 

Plotted in Fig. [16] through Fig. [18] are the probability fluxes generated by the Q~ term, the 
anisotropic pressure Hessian term, and the viscous diffusion term, respectively. By definition, 
the Sv component of the flux Wq is zero, therefore all the vectors in Fig. [16] are parallel to 
the Su axis. The lines in the figure are contours of the distribution of the non-zero (horizontal) 
component of the probability flux vector. Separated by the zero contour, the vectors in the 
lower right corner point to positive Su, while those above point to the negative direction. This 
combination shows a general tendency for the Q~ term to oppose the effect of the self-interactions 
terms. Note however, in the negative Su half plane, the vectors pointing to the negative Su 
direction would strengthen the negative amplification of Su, and thus increase the intermittency 
in 5v due to the cross-amplification mechanism explained before. On the other hand, the stream 
traces of the flux generated by the pressure Hessian, shown in Fig. [P7] have a similar shape as 
those in Fig. [T5] but with opposite direction. It indicates that the main effect of the pressure 
Hessian is to reduce the intermittency generated by the self-interaction terms. Besides, we 
note that the stream traces show an interesting sink-source structure. Fig. [18] shows that the 
contribution of viscous diffusion is mainly to introduce a nearly linear damping effect, i.e., all 
vectors pointing to the origin. 
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Figure 14: Access times (in seconds) measured as function of request size (number of points 
N p ), when the N p points are distributed randomly in the entire dataset. Different symbols are 
the same as in Fig. [I2j 

5 Conclusions 

In this paper, a new turbulence database system has been described. By applying database 
technologies, the system enables remote access to a DNS data set with 27 Terabytes, encapsu- 
lating a complete 1024 4 space-time history of forced isotropic turbulence. The architecture of 
the system, including the user interface, processing function set, data indexing and partition, 
and work load scheduling, is explained. Test cases are presented to demonstrate the usage of 
the system, and to check against possible errors. The database thus provides a new platform 
for turbulence research, and a useful prototype for building large-scale scientific databases. 

As an example for the application of the database, the advected delta-vee system introduced 
in |1Q|, II lj is analyzed by accessing the database from a remote desktop computer. The analysis 
shows that the main effect of the unclosed terms in the system is to oppose the excessive growth 
of intermittency in the velocity increments produced by the nonlinear self-interaction terms. 
The various terms (isotropic pressure Hessian effect, deviatoric pressure Hessian term, and 
viscous term) show markedly different behaviors among themselves. While the detailed physics 
underlying these observations are not yet clear, the results imply that when closure models are 
constructed for these terms, they should be modeled separately. Specifically, the viscous term 
may be modeled using a simple relaxation towards the origin, but the pressure terms will require 
more elaborate closures. 
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Figure 15: The vector plot of W sc , the probability flux generated by the self-interaction terms. 
Several stream traces are plotted. The contour plot is P(5u,5v), the joint PDF of 5u and 5v. 
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Appendices 

A Equation for the joint PDF of the longitudinal and transverse 
velocity increments 

In this appendix, we will derive the equation for the joint PDF of 5u and 5v, from the dynamic 
equations for 5u and 5v (Eqs. [2] and [3]), and the equation for the length of the line element r{t): 



dr 

— = ou r 
dt 



(7) 



Recall that the velocity increments are defined over a distance £ along the directions of evolving 
line elements r. As noted in [10|. II lj . during their evolution the line elements tend to concentrate 
along the stretching directions of the flow field. Therefore, the joint PDF defined with equal 
weight on each trajectory in the (Su, Sv) phase space will put more weight on the velocity 
increments along the stretching directions. In order to obtain unbiased statistics, a measure 
correcting procedure has been proposed in [10] , Based on mass conservation, it was shown that 
when accumulating the statistics of (Su, Sv) at time t, each trajectory has to be weighed with a 
factor [ro/r(t)] 3 , where r^ is the initial length of the line element. In terms of the joint PDF of 
Su and Sv, it implies that, assuming all the line elements initially having the same length £, the 
unbiased joint PDF of Su and Sv can be formally written as 



P(Su,5v;t) 



1 



I(t) 



+oo 



Ps(Su,Sv,r;t)r 3 dr, 



(8) 
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Figure 16: The vector plot of Wq, the probability flux generated by the Q term. By definition, 
all the vectors are parallel to Su axis. Several contours of the magnitude of the vectors are plotted. 



in which P 3 (5u,5v,r) is the traditional joint PDF of the three independent variables. I(t) is a 
normalization factor, defined as: 



I(t) 



oo r+oo r+oo 



(9) 



P 3 (Su, 5v , r; t)r d5ud5vdr. 

'-oo JO JO 

From these definitions, the equation for P can be derived from the Liouville equation for P 3 



f)P f) f) f) 

~dt + ml {{6 ^ 6u ' 6v ' r) Ps) + dfo^ 6u ' 6v ' r) Ps) + fr {{ ^ 6u ' 6v ' r) Ps) = °' (10) 

where a = da/dt means material derivative. Multiplying Eq. [10] by r -3 and integrating over r, 
one obtains 



dI(t)P d 



01 



d5u 



(5u\Su,5v,r) P 3 r dr 



+ 



d 
85v 



+00 



(5v\8u,Sv,r) P 3 r dr 



r+oo Q 

+ / r -—((r\Su,5v,r) P 3 )dr = 0. 
Jo dr 



By straightforward calculation, the first two integrals can be reduced to: 

I(t){Su\Su,Sv)P(5u,Sv) 

and 

I(t)(Sv\6u,Sv)P(6u,Sv) 

respectively. Using Eq. El the third one becomes 

r+oo a r+oo 

/ r- 3 — (5urP 3 )dr = r- 2 5uP 3 \+ QO - / 5urP 3 (-3)r~ 4 dr = 35uI(t)P, 
Jo or J 



(11) 

(12) 
(13) 

(14) 
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Figure 17: The vector plot of W p , the probability flux generated by the anisotropic part of the 
pressure Hessian. 

assuming 5uPz(5u,5v,r) approaches zero faster than r 2 for any Su, Sv when r — ► 0. Thus, one 
obtains 



dI(t)P 
dt 



+ I(t) 



-—((Su\Su,Sv)P) + —({Sv\Su,Sv)P)+3SuP 

OOU GOV 



0. 



(15) 



Integrating the above equation over Su and Sv, the normalization condition for P yields 



^P- + 31 {t) f * f SuPdSudSv = ^P- + 3I(t)(Su) = 0, 

dt J-oo Jo 



dt 



(16) 



if (5u\ 5u, 5v) P and (5v\ 5u, 8v) P tend to zero at the boundaries. Thus 



dP _ 1 dI{t)P P dl{t) 
~dt ~ T(f)~ di I(t) dt 

d 







. ((8u\ 6u, Sv) P) + -^-({5v\ Su, 8v) P) + 35uP 

OOU GOV 

+3(5u)P. 
Replacing 5u and 5v with Eq. [2] and [3] and rearranging the terms finally yields 



dP d 

+ 



+ 



5v 2 - -5u 2 ) P 
d 



+ 4ri- 25uS vP] + A;l(^\ 5u > 6v ) p } 



d5v 



d5u 



dt d5u 
^- [(Y\ Su, 5v) P] + tt^ [(Z\ Su, Sv) P] + 3(5u - {5u))P = 0. 

OOU GOV 



(17) 



(18) 



This is the equation for the joint PDF of 5u and Sv. The last term on the LHS of the equation 
comes from the measure correction procedure. Note that we have assumed £ = 1. 



21 




Figure 18: The vector plot of W v , the probability flux generated by the viscous diffusion term. 

B Documentation: database spatial differentiation 

In this appendix, details are provided about the various options for spatial differentiations 
implemented in the database cluster. Below, / denotes any one of the three components of 
velocity, u, v or w, or pressure, p, depending on which function is called. Ax and Ay are the 
width of grid in x and y direction. 

B.l Options for GetVelocityGradient and GetPressureGradient 



x n4 



x n3 



x n2 



x n-l 



x m-l 



x im-2 x iw-3 



x iw4 



Figure 19: Illustration of data points along x direction. The same approach is used in the y and 
z directions. 



FD4: 4th-order centered finite differencing 

With the edge replication of 4 data-points on each side, this option can be spatially interpolated 
using 4th-order Lagrange Polynomial interpolation. 



dx 



3Ax 



[f(x n+1 ) - /(x n _i)] 



1 
12Ax 



[f{x n+2 ) - f(x n - 2 )] 



+0(Ax 4 



(19) 
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FD6: 6th-order centered finite differencing 



d£ 

dx 



4Ax 



[f(x 



n+1, 



f(Xn-l)] 



20Ax 



[f{x n+2 ) - /(x„_ 2 )] 



+ 



1 



60Ax 



[/(x n+3 ) - /(x„_ 3 )] + 0(Ax 6 ) 



(20) 



FD8: 8th-order centered finite differencing 



With the edge replication of 4 data-points on each side, this is the highest-order finite difference 
option available. 



df_ 

dx 



5Ax 



[f(x n+1 ) - f(x n -i)] 



1 



5Ax 



[f(x 



n+2) 



f{x n -2J\ 



+ 105Ax 

+0{Ax s 



[f(x 



n+3j 



f(Xn-3)] 



1 



280Az 



[f(Xn+4) ~ f(x n -4)] 



(21) 



B.2 Options for GetVelocityLaplacian, GetVelocityHessian and GetPressure- 
Hessian 

In this section, second derivatives finite difference evaluations are shown. The expressions are 
given for derivatives along single directions in terms of the x-direction, and mixed derivatives 
are illustrated on the x-y plane. The same approach is used in the y and z directions, as well as 
in the x-z and y-z planes for the other mixed derivatives. 

FD4: 4th-order centered finite differencing (can be spatially interpolated using 
4th-order Lagrange Polynomial interpolation) 



dx 2 



\£m iVn) 



3Ax : 



:[f{x m+ i,y n ) + f(x m -i,yn) - 2/(x m , y n )} 



1 



12Ax 2 
+0(Ax 4 



[f(Xm+2, Vn) + f{Xm-2,y n ) ~ 2f(x m ,y n )] 



(22) 



d 2 f 



dxdy 



1 



\JE>m iVn) 



■[f(x m +i,y n +i) + f(x m -i,y n -i) 



3AxAy ' 

-f(x m+ i,y n -i) - f(x m -i,y n +i)] 
i 

■[f(x m+ 2,y n +2) + f(x m -2,y n -2) 



48AxAy ' 

-f(x m+2 ,y n -2) - f(Xm-2,y n +2)] 

+0{Ax\Ay i ) 



(23) 
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FD6: 6th-order centered finite differencing 



dx 2 



[Xm iVn) 



2Ax 2 
3 



[f(x m+ i,y n ) + /(x m _i,y„) -2f(x m ,y n )] 
[f{x m +2, y n ) + f{x m -2,y n ) - 2f(x m ,y n )] 



20Ax 2 ' 

+ Q0Ax2 if( X m+3,yn) + f(x m -3,y n ) -2f(x m ,y n )] 

+0(Ax 6 ) 



(24) 



d 2 f 



dxdy 



\%m iVn) 



■[f(xm+i,y n +i) + f(x m -i,y n -i) 



8 Ax Ay l 

-f(x m +i,yn-i) ~ f(x m - 1 ,y n+1 )] 

-[f(x m+ 2,y n +2) + f(Xm-2,y n -2) 



80Ax Ay 

-f{Xm+2,y n -2) ~ f(x m -2,y n +2)} 



+ 



1 



-[f(x m+3 ,y n+3 ) + f(x m -3, yn-s) 



360AxAy ' 
-f(x m+ 3,y n -3) - /(x m _3,y n+3 )] 
+0(Ax 6 ,Ay 6 ) 



FD8: 8th-order centered finite differencing 



dx 2 



792 



\%m iVn) 



591Ax 2 
207 



2955Ax 2 
104 

~ 8865Ax 2 
9 

+ 3152Ax 2 

+0(Ax 8 ) 



[f(x m+1 ,y n ) + f(x m -i,y n ) -2f(x m ,y n )] 
[f(x m+2 ,y n ) + f{x m -2, y n ) - 2f(x m , y n )] 



[f(Xm+3,yn) + f(x m -3, Vn) ~ 2f(x m , y n )\ 

[f(x m +4, y n ) + f{x m -A, y n ) - 2f(x m , y n )\ 



(25) 



(26) 
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d 2 f 



14 

; I = ik A A if( x m+l,yn+l) + /(aW-1, 2/w-l) 

dxd V (x m ,vn) 35 Ax Ay 

-f(x m +i,Vn-i) - f(x m -i,y n +i)] 

~ onA A [f( x m+2,y n +2) + f(x m -2,y n -2) 

zUAxAy 

-f(Xm+2,yn-2) ~ f(x m -2, Vn+2)] 

2 
+ 3l5AxA~^ <yXm+3,yn+3 " > + f( Xm ~ 3 > yn -'^ 

-f(x m +3,Vn-3) ~ f(x m -3,yn+3)] 
~ 2240AxA ^( Xm+4 '^ n + 4 ) +/( X m-4,2/n-4) 

-f{x m +4,y n -A) - f(x m -A,y n +4,)] 

+0(Ax 8 ,Ay s ) (27) 

C Documentation: Database Spatial Interpolation Options 

In this appendix, details are provided about the various options for spatial interpolation imple- 
mented in the database cluster. 



x n3 x n2 x nl x n x nfl x ih-2 x ih3 x iw4 

Figure 20: Illustration of Lagrangian interpolation. 

C.l Interpolation Options for Get Velocity and GetVelocityAndPressure 

In this section, / denotes any one of the three components of velocity, u, v or w, or pressure, 
p, depending on which function is called. Ax, Ay and Az are the width of grid in x, y and z 
direction, x' = (x',y',z'). 

NoSInt: No spatial interpolation 

In this case, the value at the datapoint closest to each coordinate value is returned, rounding 
up or down in each direction. 

/(x') = f(x n ,y p ,z g ) (28) 

where n = int(j^ + }-) lP = int{^ + \),q = int{-^ + \). 
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Lag4: 4th-order Lagrange Polynomial interpolation 

In this case, 4th-order Lagrange Polynomial interpolation is done along each spatial direction. 

4 4 4 

/( x = J2J2J2f(Xn-2+i,y P -2+j,Z q -2+k) 
i=l j=l k=l 

•zr 2 +VHr 2+ vwr 2+ V) (29) 



n+2 



n (*' 



W') = '^^ (30) 

n & - 6j) 

j=n-l,j^i 

where can be x, y, or z. 

Lag6: 6th-order Lagrange Polynomial interpolation 

In this case, 6th-order Lagrange Polynomial interpolation is done along each spatial direction. 

6 6 6 

/( x = ^^Z^Zfi x n-?,+i,yv-?>+i^ z q-?>+k) 
i=l j=l k=l 

■Q- 3+i ( a /)-i*r 3+j b/)-ir 3+k (*') (3i) 

n+3 

n (*' - 0j) 

W) = ^^ (32) 

n (Oi-ej) 

j=n-2,jj£i 

where 6 can be x, y, or z. 

Lag8: 8th-order Lagrange Polynomial interpolation 

In this case, 8th-order Lagrange Polynomial interpolation is done along each spatial direction. 

8 8 8 

Z( x ') = ^2^2^2f( x n-4+i,y P ~4+j,Z q -4+k) 
i=\ j=l fe=l 

■l n x - A +\x')-ll- 4+ i{y')-ir^\z') (33) 



n+4 



n (<?' 



where 6 can be x, y, or z. 



W) = ^^T^ (34) 

n {Bi-Qj) 

j=n-3,j^i 
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C.2 Interpolation Options for Get Velocity Gradient, GetPressureGradient, 
GetVelocityLaplacian, GetVelocityHessian and GetPressureHessian 

In this section, / denotes velocity or pressure gradient, velocity or pressure Hessian, or velocity 
Laplacian, depending on which function is called. 

FD4NoInt, FD6NoInt, FD8NoInt: No spatial interpolation 

In this case, the value of the 4th, 6th or 8th order finite-difference evaluation of the derivative 
at the datapoint closest to each coordinate value is returned, rounding up or down in each 
direction. 

/( x = f(x n ,y P ,z q ) (35) 

where n = int(j^ + \),p = int{j^- + I), q = int(-^ + \). 

FD4Lag4: 4th-order Lagrange Polynomial interpolation of the 4th-order finite dif- 
ferences 

In this case, the values of the 4th order finite-difference evaluation of the derivative at the data 
points are interpolated using 4th-order Lagrange Polynomials. 

4 4 4 

/( x ') = ^2J2J2^ Xn - 2 + hy p- 2 +^ z i- 2 +^ 

i=l j=i k=X 

•zr 2 +Vwr 2+j Vwr 2+ V) (36) 



n+2 

n (*' - 9j) 

j=n-l,j^z 

n+2 

n v< - o,) 



(37) 



where 6 can be x, y, or z. 

D Documentation: Database Temporal Interpolation Options 



%1 t n tjy.1 tjy.2 

Figure 21: Illustration of data points for time. 



In this section, / denotes velocity, pressure, velocity or pressure gradient, velocity or pressure 
Hessian, or velocity Laplacian, depending on which function is called. At is the time increment 
between consecutive times stored in the database. 
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NoTInt: No temporal interpolation 

In this case, the value at the datapoint closest to the time value is returned, rounding up or 
down. 

fit') = f(t n ) (38) 

where n = int{j^ + |). 

PCHIP: Cubic Hermite Interpolation in time 

The value from the two nearest time points is interpolated at time t' using Cubic Hermite Inter- 
polation Polynomial, with centered finite difference evaluation of the end-point time derivatives 
- i.e. a total of four temporal points are used. 

f{t') = a + b(t' - t n ) + c(t' - t n ) 2 + d{t' - t n f(t' - t n+l ) (39) 

where 

a = f{t n ) 





b 


= 


/(tn+i) - fit n 

2At 


-l) 








= 


fit 


„+i) - 2/(t„) + 


fitn 


-l) 








2At 2 










-f(tn 


-l) 


+ 3/(tn)-3/(t, 


^+l) 


+ /(*« 


+2) 



2At 3 
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